Finding a Good Collection of Patterns Covering a Set of Sequences iiContents

نویسندگان

  • Esko Ukkonen
  • Jaak Vilo
چکیده

The papers in the series are intended for internal use and are distributed by the author. Copies may be ordered from the library of Department of Computer Science. Abstract. We consider a problem of learning of unions of pattern languages from positive examples. We consider three diierent classes of patterns-regular patterns, substring patterns and the so called PROSITE patterns. By regular patterns we understand patterns where each variable symbol can appear only once. By substring patterns we understand a subclass of regular patterns of the type xxy, where x and y are variables and is a string of constant symbols. The PROSITE patterns is a class of patterns used for classiication of bio-sequences in PROSITE database. We present an algorithm which, given a set of sequences, nds a `good' collection of patterns`covering' this set. The notion of a `good covering' is deened as the most probable collection of patterns likely to produce the examples in some simple and natural probabilistic model. We show that this criterion is equivalent to the so called Minimum Description Length (MDL) principle. We present a polynomial-time algorithm for approximating the optimal cover within a logarithmic factor and prove its performance guarantees. In the case of substring patterns the running time of the algorithm is almost linear.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

Multigranulation single valued neutrosophic covering-based rough sets and their applications to multi-criteria group decision making

In this paper, three types of (philosophical, optimistic and pessimistic) multigranulation single valued neutrosophic (SVN) covering-based rough set models are presented, and these three models are applied to the problem of multi-criteria group decision making (MCGDM).Firstly, a type of SVN covering-based rough set model is proposed.Based on this rough set model, three types of mult...

متن کامل

A set-covering formulation for a drayage problem with single and double container loads

This paper addresses a drayage problem, which is motivated by the case study of a real carrier. Its trucks carry one or two containers from a port to importers and from exporters to the port. Since up to four customers can be served in each route, we propose a set-covering formulation for this problem where all possible routes are enumerated. This model can be efficiently solved to optimality b...

متن کامل

Capacitated Single Allocation P-Hub Covering Problem in Multi-modal Network Using Tabu Search

The goals of hub location problems are finding the location of hub facilities and determining the allocation of non-hub nodes to these located hubs. In this work, we discuss the multi-modal single allocation capacitated p-hub covering problem over fully interconnected hub networks. Therefore, we provide a formulation to this end. The purpose of our model is to find the location of hubs and the ...

متن کامل

IRF and ISRF Sequences and their Anti-Pedagogical Value

Initiation, Response, and Feedback(IRF) sequences are the most frequent interaction network in any classroom contexts. IRF sequences have been examined profusely in previous studies and were reported to be negatively correlated with participation opportunities (Kasper, 2006; Cazden, 2001; Ellis, 1994).In all these studies, all contingent factors of any classroom context which might influence in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995